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The idea of information encoding on quantum bearers and its quantum-mechanical processing 
has revolutionized our world and brought mankind on the verge of enigmatic era of quantum tech- 
nologies. Inspired by this idea, in present paper we search for advantages of quantum information 
processing in the field of machine learning. We show that the simplest learning machine - perceptron 
- can dramatically increase its learning capabilities, if operates according to the laws of quantum 
mechanics. Exploiting only basic properties of the Hilbert space, superposition principle of quan- 
tum mechanics and quantum measurements to introduce a quantum perceptron, we demonstrate, 
for instance, that it is able to learn an arbitrary (Boolean) logical function, while this learning task 
can not be performed by its classical counterpart. The quantum perceptron learning rule, moreover, 
does not require any optimization procedure, which is necessary for classical learning models. 

PACS numbers: 03.67.-a, 87.19.11, 87.19.lv 



During last few decades, we have been witnessing uni- 
fication of quantum physics and classical information 
science that resulted in constitution of new disciplines 
- quantum information and quantum computation [J. 
While processing of information that is encoded in sys- 
tems exhibiting quantum properties suggests, for exam- 
ple, unconditionally secure quantum communication [5] 
and superdense coding computers that operate ac- 
cording to the laws of quantum mechanics offer efficient 
solving of problems that are intractable on conventional 
computers Having paramount practical importance, 
these announced technological benefits have indicated the 
main directions of the research in the field of quantum 
information and quantum computation, somehow leaving 
aside other potential applications of quantum physics in 
information science. So far, for instance, very little at- 
tention has been paid on possible advantages of quantum 
information processing in such areas of modern informa- 
tion science as machine learning ^ and artificial intelli- 
gence j6, . Although machine learning governed by quan- 
tum mechanics have been demonstrated to have certain 
advantages over classical learning [71IH], these advantages 
are strongly coupled with more sophisticated optimiza- 
tion procedure than in the classical case. This paper, in 
contrast, presents a new approach for machine learning 
which does not require any optimization at all. 

Our focus is on perceptron, which is the simplest learn- 
ing machine. Perceptron is a model of neuron that was 
originally introduced by Rosenblatt '5] to perform visual 
perception tasks, which, in mathematical terms, result 
in solution of the linear classification problem. There are 
two essential stages of a perceptron functioning: super- 
vised learning session and new data classification. Dur- 
ing the first stage, the perceptron is given a labeled set 
of examples. Its task is of inferring weights of a linear 
function according to some error-correcting rule. Subse- 
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quently, this function is utilized for classification of new 
previously unseen data. In spite of its very simple learn- 
ing rule and internal structure, perceptron's capabilities 
are seriously limited [lO] . Perceptron can not provide the 
classification, if there is an overlap in the data or if the 
data can not be linearly separated. It is also incapable 
of learning complex logical functions, such as XOR func- 
tion. Moreover, by its design, perceptron can distinguish 
only two classes and, therefore, can not resolve the situ- 
ation when the input belongs to none of the two classes. 

In this paper we show that all the mentioned problems 
can be, in principle, overcome by a quantum perceptron. 
There are also two operational stages for the quantum 
perceptron. During the learning stage all the data are 
formally represented through quantum states of physical 
systems. This representation allows expanding the data 
space to a physical Hilbert space. It is important to note, 
that there is no need to involve real physical systems dur- 
ing this stage. Thus, the learning is essentially a classical 
procedure. The subject of the learning is a set of posi- 
tive operator valued measurements (POVM) [1]. The set 
is constructed by making superpositions of the training 
data in a way that each operator detects one particular 
class. This procedure is linear, does not require solving 
equations or optimizing parameters. When the learning 
is over, real quantum systems come into play: new data 
is encoded into the states of the quantum systems, which 
are measured with the POVM. Based on the results of 
the measurements, the required classification is achieved. 

In the following, we shall briefiy overview classical per- 
ceptron and discuss the origin of the restrictions on its 
learning capabilities. After this, we shall introduce a 
quantum perceptron and show how it can overcome the 
restrictions by example of XOR function learning. Fi- 
nally, we shall discuss potential capabilities of the con- 
cept of quantum perceptron. 

Operational structure of the classical perceptron is 
simple. Given an input vector x (which is usually called a 
feature vector) consisting of n features, perceptron com- 
putes a weighted sum of its components /(x) = J2i o-iXi, 
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FIG. 1: A schematic representation of a classical perceptron 
with three input features. 



where weights have been previously learned. The out- 
put from a perceptron is given by o — sign(/(x)), where 
sign is the Heaviside function 



sign(y) = { 



y>Q 

y<Q 



(1) 



Depending on the binary output signal o = {+l,— 1}, 
the input feature vector x is classified between two fea- 
ture classes, one of which is associated with output 
o = +1 and the other with output o = — 1. A stan- 
dard graphical representation of a perceptron is given 
in Fig. [T] As we have mentioned above, the percep- 
tron needs to be trained before its autonomous oper- 
ation. During the training, a set of P training data 
pairs {x.i,di,i — 1,...,P} is given, where are the n- 
dimensional feature vectors and di are desired binary 
outputs. Typically, at the beginning of the learning pro- 
cedure the initial weights of the linear function are 
generated randomly. When a data pair is chosen from 
the training set, the output Oi — sign(/(xi)) is computed 
from the input feature vector x^ and is compared to the 
desired output di. If the actual and the desired outputs 
match Oi — di, the weights are left without change 
and the next pair from the data set is taken for the anal- 
ysis. If Oi ^ di, the weights of the linear function 
are to be changed according to the error-correcting rule 
a' a + ea = a -|- (di — Oi)xi. The error-correcting rule 
is applied until the condition Oi = di is met. 

The training procedure has a clear geometric interpre- 
tation. The weights of the linear function define a 
n — 1-dimensional hyperplane in the n-dimensional fea- 
ture space. The training procedure results in a hyper- 
plane that divides the feature space on two subspaces, 
so that each feature class occupies one of the subspaces. 
Due to this interpretation, the origin of the restrictions on 
learning capabilities of the classical perceptron becomes 
visible: a hyperplane that separates the two classes may 
not exist. An example of two classes that can not be lin- 
early separated is XOR logical function of two variables 



xi 1 1 
X2 1 1 
/Olio 

o -1 +1 +1 -1 




FIG. 2: The feature space of XOR function is two-dimensional 
and discrete (each feature takes only values and 1). There is 
no a line (a one-dimensional hyperplane) that separates Os and 
Is. Classical perceptron is incapable of classifying the input 
feature vectors and, therefore, can not learn XOR function. 



A representation of this function in the two-dimensional 
feature space is shown in Fig. [2j 

Understanding the principles of classical perceptron 
functioning, we are ready to move forward and introduce 
quantum perceptron. In order to simplify our discussion, 
let us first consider a particular classification task - XOR 
function learning. 

As its classical counterpart, quantum perceptron is 
to be trained to perform the classification task. Sup- 
pose, we are given a set of four training data pairs 
{:x.i,di,i = 1,...,4}, where the feature vector consists 
of two features x = {xi,X2}, and the desired output 
d = {-|-1,— 1} is a binary function. Let us represent the 
input features through the states of a two-dimensional 
quantum system - qubit, so that each feature is given 
by one of the basis states \xi) — {|0) , |1)} for i — 1,2, 
where {|0),|1)} denotes an orthonormal computational 
basis [T]. The quantum representation allows extending 
the two-dimensional feature space to four-dimensional 
Hilbert space of the two-qubit system. In the above rep- 
resentation, the feature vector x is given by one of the 
four two-qubit states \xi,X2). 

During the learning, for a given feature vector |a;i,a;2) 
and desired output d, let us find a vector \^p) from the 
condition {'ip\xi,X2) = \d\ and construct an operator 
Pd = (V"!: where the modulus \d\ is taken in order 
to avoid construction of unphysical (negative) operators. 
Repeating this procedure for all data from the training 
data set, let us summate and normalize all operators that 
belong to d = — 1 and d = +1. In result of the four data 
pairs learning, we have two operators 



P_i = |0,0>(0,0| + |1,1)(1,1| 
P+1 = |0,1)(0,1| + |1,0)(1,0| 



(3) 



(2) 



It is easy to check that operators P_i and P+i are or- 
thogonal P_iP+i = and form a complete set P_i + 
P+i — I, where / is the identity operator. During its au- 
tonomous operation, quantum perceptron may be given 
a two-qubit system prepared in one of the four states 
|a;i,a;2) = {|0,0) , |0, 1) , |1,0) , |1,1)}. With the help of 
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the operator P_i states |0, 0) and |1, 1) are measured and 
assigned class d = —1, while the states |0, 1) and |1, 0) are 
detected by the operator F+i and classified to d = +1. 
The fact that the operators P_i and P+i are orthogonal 
ensures zero probability of misclassification, while com- 
pleteness of the set of operators guarantees classification 
of any input. Conclusively, the quantum perceptron has 
learned XOR function. 

The successful XOR function learning by quantum per- 
ceptron is the consequence of the representation of the 
classical feature vector x through the two-qubit states. 
In the classical representation, the feature vectors can 
not be linearly separated on a plane, see Fig. [2j In the 
quantum representation, four mutually orthogonal states 
\xi, X2) in the four-dimensional Hilbert space can be sep- 
arated on two classes in an arbitrary fashion. This im- 
plies that an arbitrary logical function of two variables 
can be learned by quantum perceptron. For example, 
learning of logical AND function leads to the construction 
of operators P_i = |0,0) (0,0| -I- |0, 1) (0, 1| + |1,0) (1,0| 
and P+i = |1,1)(1,1|. Moreover, an arbitrary logical 
function of an arbitrary number of inputs also can be 
learned by quantum perceptron, because the number of 
inputs of such a function growth exponentially with the 
order of the function and exactly as fast as dimensionality 
of the Hilbert space that is needed to represent the logical 
function. It is very important to note that, in spite of the 
exponential growth of the Hilbert space, the number of 
qubits, that are required for the learning, growth linearly 
and there are always just two (or, as we will see later, 
three) operators to perform during autonomous work of 
the perceptron. 

Form the above example we have seen how the quan- 
tum representation helps to learn the logical XOR func- 
tion, but the role of quantum measurements was not evi- 
dent. Let us slightly modify the problem of XOR learning 
to explain the reason of introducing quantum measure- 
ments in the quantum perceptron functioning. In real- 
life learning tasks the training data may be corrupted by 
noise [5]. In some cases, noise may lead to overlapping 
of the training data, which result in misclassification of 
feature vectors during the training stage and during fur- 
ther autonomous functioning. For example, if, during 
the XOR learning, there is a finite small probability 5 
that feature xi takes a wrong binary value, but the other 
feature and the desired output are not affected by noise, 
after a big number of trainings (which are usually re- 
quired in case of learning from noisy data) , the operators 
P_i and P+i are given by 

P'-i = (1-<5)(|0,0)(0,0K|1,1)(1,1|) 
+ <5(|0,1) (0,l|-f |1,0)(1,0|) , 

P'+i = (1-<5)(|0,1)(0,1|-I-|1,0)(1,0|) 

-f ,5(10,0) (0,0| -f |1,1)(1,1|) . (4) 

These operators P'_i and P'+i form a complete set 
P'_i -I- P'+i = 1 1 but they are not orthogonal any more 



P'_iP'_i_i ^ 0. This means that during autonomous op- 
eration of the quantum perceptron, the input feature vec- 
tors arc misclassified with probability 5. Nevertheless, 
on average, most of the feature vectors are classified cor- 
rectly. This means that quantum perceptron simulates 
XOR function with a degree of accuracy given by 1 — (5. 
At this point one may think of an error-correcting proce- 
dure to transform operators P'_i and P'+i to operators 
P_i and P+i, which may be an orthogonalization of two 
diagonal operators. But, we prefer to look at the re- 
sult of the learning from the noisy training data from a 
slightly different perspective. When quantum perceptron 
is trained on the noisy data, it can exactly (in probabilis- 
tic sense) reproduce fluctuations that have been observed 
during the training. This ability may be useful in some 
cases. At least, classical perceptron can not do anything 
like this. 

From the above discussion we can conclude that, hav- 
ing internally probabilistic nature, quantum measure- 
ments allow probabilistic classifying of the feature vec- 
tors, when there is some nonzero probability of misclas- 
sification. Rephrasing the last statement, due to quan- 
tum measurements, quantum perceptron is capable of 
performing classification when there is an overlap in the 
data. 

Now, we have a sufficient background to analyze the 
full power of the concept of quantum perceptron. Let us 
have a closer look on the learning stage. Given a set of P 
training data pairs {x^, di,i = 1, P}, we need to repre- 
sent the input features through the states of a quantum 
system. This is indeed the most difficult part. In the case 
of Boolean functions the quantum representation through 
qubits is intuitively understandable. In general, classifi- 
cation tasks may be very different in origin; therefore we 
do not have a general receipt to construct a quantum rep- 
resentation for a given (real) n-dimensional feature space 
P". The main requirement for the construction is that 
the quantum representation must have all the topological 
properties of the original feature manifold. 

But, suppose we have constructed a quantum repre- 
sentation and found operators P^ = according to 
the rule {'4>\xi,X2) = \d\. Making the sum of the corre- 
sponding operators we eventually get two operators P_i 
and P+i. Each of these operators take into account all 
training data that belongs to the corresponding output. 
Therefore, by their construction and due to the super- 
position principle of quantum mechanics, operator P_i 
can detect any previously seen feature vector (that cor- 
responds to the output o = — 1) and an arbitrary linear 
(convex) combination of such feature vectors. The same 
applies to the operator P+i. There are only four possi- 
bilities of how learning stage may end: 

• Operators P_i and P_^i are orthogonal P_iP+i = 
and form a complete set P_i + P+i = /. We 
have met this case when analyzing XOR learning 
without noise: there is no overlap in the data and 
each input feature vector can be classified between 
the two classes with no mistake. 
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• Operators P_i and P+i are orthogonal P_iP_|_i = 
0, but do not form a complete set P_i + P+i ^ I- 
This is an extremely interesting case. We can de- 
fine the third operator Pq — / — P_i — P+i, which is 
orthogonal to P^i and P+i, because P_iP+i = 0. 
During its autonomous functioning, quantum per- 
ceptron generates three outputs d = {+1,0,-1}, 
namely that the feature vector belongs to the one 
of the previously seen classes d~ {+1,-1} or it is 
essentially different from the learned classes d = 
- it belongs to a new, previously unseen, class [TT] . 
The classification on previously unseen classes is an 
extremely hard learning problem, which can not be 
done even by the most of the classical perceptron 
networks [SJ 110) . Quantum perceptron is capable 
of performing this task with no mistakes due to the 
orthogonality of the operators P_i,P+i and Pq. 

• Operators P_i and P+i are not orthogonal 
P_iP+i 7^ 0, but form a complete set P_i + P_|_i = 
/. This is the case of the noisy XOR learning: all 
the data can be classified on the two classes with 
some nonzero probability of mistake. 

• The most general case is when operators P_i and 
P+i are not orthogonal P^iP^i ^ and do not 
form a complete set P_i + P+i ^ I. We can again 
define the third operator Pq ~ / — P_i — P+i, which 
this time is not orthogonal to P_i and P+i- In 
this situation, quantum perceptron classifies all the 
input feature vectors on three classes, one of which 
is a new class, with some nonzero probability of 
mistake. 

When the training stage is over, the only possible re- 
striction on practical implementations of quantum per- 
ceptron is our ability to prepare and measure quantum 
systems. In the case of logical functions learning we need 
to deal with qubits, which may be electrons, nuclear or 
molecular spins, quantum dots or photons. Being at the 
heart of quantum information science, the art of prepara- 



tion and detection of such systems has reached unprece- 
dented heights. Current technologies, for example, allow 
us handling 10^ photonic qubits [T^]. Therefore, we do 
not see practical limitations on implementation of quan- 
tum perceptron, at least, for the task of logical functions 
learning. 

In conclusion, bridging between quantum information 
science and machine learning theory, we showed how ca- 
pabilities of a learning machine can be dramatically in- 
creased, if it operates according to the laws of quantum 
mechanics. We introduced quantum perceptron and ar- 
gued that it can potentially perform tasks, which are un- 
doable for its classical counterpart: learning arbitrary 
logical functions, classification of data with an overlap 
and classification on previously unseen classes. We sup- 
ported our point showing explicitly how quantum per- 
ceptron can learn logical XOR function. 

It is very important to note that learning of com- 
plex logical functions and classify data with an overlap 
can be performed in the framework of classical learn- 
ing models, for example, by support vector machines [S]. 
However, implementation of any classical model demands 
optimization, which complicates rapidly with growth of 
the feature space, the so-called curse of dimensionality. 
Quantum perceptron is immune to the curse, since its 
learning rule does not require any optimization. More- 
over, none of the classical learning models can build the 
third class from the data that do not belong to the two 
classes observed during the learning stage, while quan- 
tum perceptron perform this task with no difficulties. 

As a final remark we would like to note that a network 
of classical perceptrons (with a hidden layer) is shown to 
be a universal approximator 13 . This implies that an 
arbitrary real-valued function can be simulated by the 
network and, therefore, the network is as powerful (in 
computational capabilities) as Turing machine. It may be 
that a quantum perceptron network is competitive with 
a quantum computer, if not exceeds it in computational 
power. 
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